Vietnamese Text Classification with TextRank and Jaccard Similarity Coefficient
نویسندگان
چکیده
منابع مشابه
Unilateral Jaccard Similarity Coefficient
Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Jaccard Similarity Coefficient (uJaccard), which doesn’t only take into consideration the space among two points b...
متن کاملEfficient Identification of Tanimoto Nearest Neighbors All Pairs Similarity Search Using the Extended Jaccard Coefficient
Tanimoto, or extended Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-the-art methods for market basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tanimoto nearest neighbors. Given the rapidly increas...
متن کاملKernels and Similarity Measures for Text Classification
Measuring similarity between two strings is a fundamental step in text classification and other problems of information retrieval. Recently, kernel-based methods have been proposed for this task; since kernels are inner products in a feature space, they naturally induce similarity measures. Information theoretic (dis)similarities have also been the subject of recent research. This paper describ...
متن کاملDetecting Zero-day Polymorphic Worms with Jaccard Similarity Algorithm
Zero-day polymorphic worms pose a serious threat to the security of Mobile systems and Internet infrastructure. In many cases, it is difficult to detect worm attacks at an early stage. There is typically little or no time to develop a well-constructed solution during such a worm outbreak. This is because the worms act only to spread from node to node and they bring security concerns to everyone...
متن کاملVariations of the Similarity Function of TextRank for Automated Summarization
This article presents new alternatives to the similarity function for the TextRank algorithm for automated summarization of texts. We describe the generalities of the algorithm and the different functions we propose. Some of these variants achieve a significative improvement using the same metrics and dataset as the original publication.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in Science, Technology and Engineering Systems Journal
سال: 2020
ISSN: 2415-6698,2415-6698
DOI: 10.25046/aj050644